AITopics | navigation performance

Collaborating Authors

navigation performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

58e6c003c9fb3992265005ff6aef1913-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 01:07:39 GMT

agent, backdoor attack, navigation, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Security & Privacy (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation

Pan, Yiyuan, Xu, Yunzhe, Liu, Zhe, Wang, Hesheng

arXiv.org Artificial IntelligenceOct-22-2025

Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2510.00441

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor

Neural Information Processing SystemsOct-10-2025, 03:24:28 GMT

VLN, achieved by implanting object-aware backdoors during the training phase. Tailored to the unique VLN nature of cross-modality and continuous decision-making, we propose a novel backdoored VLN paradigm: IPR Backdoor.

agent, backdoor attack, navigation, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Security & Privacy (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation

Xu, Yunzhe, Pan, Yiyuan, Liu, Zhe

arXiv.org Artificial IntelligenceOct-10-2025

Vision-and-Language Navigation (VLN) requires agents to follow natural language instructions through environments, with memory-persistent variants demanding progressive improvement through accumulated experience. Existing approaches for memory-persistent VLN face critical limitations: they lack effective memory access mechanisms, instead relying on entire memory incorporation or fixed-horizon lookup, and predominantly store only environmental observations while neglecting navigation behavioral patterns that encode valuable decision-making strategies. We present Memoir, which employs imagination as a retrieval mechanism grounded by explicit memory: a world model imagines future navigation states as queries to selectively retrieve relevant environmental observations and behavioral histories. The approach comprises: 1) a language-conditioned world model that imagines future states serving dual purposes: encoding experiences for storage and generating retrieval queries; 2) Hybrid Viewpoint-Level Memory that anchors both observations and behavioral patterns to viewpoints, enabling hybrid retrieval; and 3) an experience-augmented navigation model that integrates retrieved knowledge through specialized encoders. Extensive evaluation across diverse memory-persistent VLN benchmarks with 10 distinctive testing scenarios demonstrates Memoir's effectiveness: significant improvements across all scenarios, with 5.4% SPL gains on IR2R over the best memory-persistent baseline, accompanied by 8.3x training speedup and 74% inference memory reduction. The results validate that predictive retrieval of both environmental and behavioral memories enables more effective navigation, with analysis indicating substantial headroom (73.3% vs 93.4% upper bound) for this imagination-guided paradigm. Code at https://github.com/xyz9911/Memoir.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.08553

Genre: Research Report (0.50)

Industry:

Education (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.57)

Add feedback

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

Wang, Hongze, Sun, Boyang, Xing, Jiaxu, Yang, Fan, Hutter, Marco, Shah, Dhruv, Scaramuzza, Davide, Pollefeys, Marc

arXiv.org Artificial IntelligenceOct-3-2025

Object-Goal Navigation (ObjectNav) is a critical component toward deploying mobile robots in everyday, uncontrolled environments such as homes, schools, and workplaces. In this context, a robot must locate target objects in previously unseen environments using only its onboard perception. Success requires the integration of semantic understanding, spatial reasoning, and long-horizon planning, which is a combination that remains extremely challenging. While reinforcement learning (RL) has become the dominant paradigm, progress has spanned a wide range of design choices, yet the field still lacks a unifying analysis to determine which components truly drive performance. In this work, we conduct a large-scale empirical study of modular RL-based ObjectNav systems, decomposing them into three key components: perception, policy, and test-time enhancement. Through extensive controlled experiments, we isolate the contribution of each and uncover clear trends: perception quality and test-time strategies are decisive drivers of performance, whereas policy improvements with current methods yield only marginal gains. Building on these insights, we propose practical design guidelines and demonstrate an enhanced modular system that surpasses State-of-the-Art (SotA) methods by 6.6% on SPL and by a 2.7% success rate. We also introduce a human baseline under identical conditions, where experts achieve an average 98% success, underscoring the gap between RL agents and human-level navigation. Our study not only sets the SotA performance but also provides principled guidance for future ObjectNav development and evaluation. Recent advances in computer vision and deep learning have inspired growing interest in interdisciplinary applications that bridge perception, reasoning, and control, especially in robotics. Among these, vision-based navigation has emerged as a foundational capability for autonomous mobile agents. A key benchmark in this domain is Object-Goal Navigation (ObjectNav), where a robot must navigate to an instance of a specified object category in an unseen environment, relying solely on its onboard sensors. This task is both practically important and technically challenging: it requires semantic understanding, spatial reasoning, and long-horizon planning. Among many approaches, Reinforcement Learning (RL) has become a dominant paradigm for ObjectNav, offering a structured framework to learn directly through trial-and-error and showing steady progress across various benchmarks. While end-to-end RL policies are common, modular RL approaches have shown greater robustness and improved generalization.

machine learning, navigation, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2510.0183

Genre: Research Report > New Finding (0.93)

Industry: Education > Educational Setting (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs

Zhang, Yue, Ma, Tianyi, Wang, Zun, Qiao, Yanyuan, Kordjamshidi, Parisa

arXiv.org Artificial IntelligenceSep-30-2025

Integrating large language models (LLMs) into embodied AI models is becoming increasingly prevalent. However, existing zero-shot LLM-based Vision-and-Language Navigation (VLN) agents either encode images as textual scene descriptions, potentially oversimplifying visual details, or process raw image inputs, which can fail to capture abstract semantics required for high-level reasoning. In this paper, we improve the navigation agent's contextual understanding by incorporating textual descriptions from multiple perspectives that facilitate analogical reasoning across images. By leveraging text-based analogical reasoning, the agent enhances its global scene understanding and spatial reasoning, leading to more accurate action decisions. We evaluate our approach on the R2R dataset, where our experiments demonstrate significant improvements in navigation performance.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.25139

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Spatially-Enhanced Recurrent Memory for Long-Range Mapless Navigation via End-to-End Reinforcement Learning

Yang, Fan, Frivik, Per, Hoeller, David, Wang, Chen, Cadena, Cesar, Hutter, Marco

arXiv.org Artificial IntelligenceSep-5-2025

Recent advancements in robot navigation, particularly with end-to-end learning approaches such as reinforcement learning (RL), have demonstrated strong performance. However, successful navigation still depends on two key capabilities: mapping and planning (explicitly or implicitly). Classical approaches rely on explicit mapping pipelines to register egocentric observations into a coherent map. In contrast, end-to-end learning often achieves this implicitly -- through recurrent neural networks (RNNs) that fuse current and historical observations into a latent space for planning. While existing architectures, such as LSTM and GRU, can capture temporal dependencies, our findings reveal a critical limitation: their inability to effectively perform spatial memorization. This capability is essential for integrating sequential observations from varying perspectives to build spatial representations that support planning. To address this, we propose Spatially-Enhanced Recurrent Units (SRUs) -- a simple yet effective modification to existing RNNs -- that enhance spatial memorization. We further introduce an attention-based network architecture integrated with SRUs, enabling long-range mapless navigation using a single forward-facing stereo camera. We also employ regularization techniques to facilitate robust end-to-end recurrent training via RL. Experimental results show 23.5% overall improvement in long-range navigation compared to existing RNNs. With SRU memory, our method outperforms RL baselines -- one relying on explicit mapping and the other on stacked historical observations -- by 29.6% and 105.0%, respectively, across diverse environments requiring long-horizon mapping and memorization. Finally, we address the sim-to-real gap by leveraging large-scale pretraining on synthetic depth data, enabling zero-shot transfer for deployment across diverse and complex real-world environments.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2506.05997

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.46)
Transportation (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Contact-Aided Navigation of Flexible Robotic Endoscope Using Deep Reinforcement Learning in Dynamic Stomach

Ng, Chi Kit, Gao, Huxin, Ren, Tian-Ao, Lai, Jiewen, Ren, Hongliang

arXiv.org Artificial IntelligenceSep-3-2025

-- Navigating a flexible robotic endoscope (FRE) through the gastrointestinal tract is critical for surgical diagnosis and treatment. However, navigation in the dynamic stomach is particularly challenging because the FRE must learn to effectively use contact with the deformable stomach walls to reach target locations. T o address this, we introduce a deep reinforcement learning (DRL) based Contact-Aided Navigation (CAN) strategy for FREs, leveraging contact force feedback to enhance motion stability and navigation precision. The training environment is established using a physics-based finite element method (FEM) simulation of a deformable stomach. Trained with the Proximal Policy Optimization (PPO) algorithm, our approach achieves high navigation success rates (within 3 mm error between the FRE's end-effector and target) and significantly outperforms baseline policies. In both static and dynamic stomach environments, the CAN agent achieved a 100% success rate with 1.6 mm average error, and it maintained an 85% success rate in challenging unseen scenarios with stronger external disturbances. These results validate that the DRL-based CAN strategy substantially enhances FRE navigation performance over prior methods.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2509.00319

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Gastroenterology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models

Dong, Xiangyu, Zhao, Haoran, Gao, Jiang, Li, Haozhou, Ma, Xiaoguang, Zhou, Yaoming, Chen, Fuhai, Liu, Juan

arXiv.org Artificial IntelligenceAug-27-2025

Recent advances in vision-language navigation (VLN) were mainly attributed to emerging large language models (LLMs). These methods exhibited excellent generalization capabilities in instruction understanding and task reasoning. However, they were constrained by the fixed knowledge bases and reasoning abilities of LLMs, preventing fully incorporating experiential knowledge and thus resulting in a lack of efficient evolutionary capacity. To address this, we drew inspiration from the evolution capabilities of natural agents, and proposed a self-evolving VLN framework (SE-VLN) to endow VLN agents with the ability to continuously evolve during testing. To the best of our knowledge, it was the first time that an multimodal LLM-powered self-evolving VLN framework was proposed. Specifically, SE-VLN comprised three core modules, i.e., a hierarchical memory module to transfer successful and failure cases into reusable knowledge, a retrieval-augmented thought-based reasoning module to retrieve experience and enable multi-step decision-making, and a reflection module to realize continual evolution. Comprehensive tests illustrated that the SE-VLN achieved navigation success rates of 57% and 35.2% in unseen environments, representing relative performance improvements of 23.9% and 15.0% over current state-of-the-art methods on R2R and REVERSE datasets, respectively. Moreover, the SE-VLN showed performance improvement with increasing experience repository, elucidating its great potential as a self-evolving agent framework for VLN.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2507.13152

Country: Asia > China (0.69)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Filters

Collaborating Authors

navigation performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

2e5c2cb8d13e8fba78d95211440ba326-Supplemental.pdf

58e6c003c9fb3992265005ff6aef1913-Paper-Conference.pdf

Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation

Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor

Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs

Spatially-Enhanced Recurrent Memory for Long-Range Mapless Navigation via End-to-End Reinforcement Learning

Contact-Aided Navigation of Flexible Robotic Endoscope Using Deep Reinforcement Learning in Dynamic Stomach

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models